Novel Pre-Processing Technique for Web Log Mining by Removing Global Noise, Cookies and Web Robots
نویسندگان
چکیده
Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. Web pages usually contain huge amount of information that may not interest the user, as it may not be the part of the main content of the web page. Web Usage Mining (WUM) is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user's visiting behaviors and obtains their interests by investigating the samples. Since WUM directly involves in applications, such as, e-commerce, e-learning, Web analytics, information retrieval etc. Weblog data is one of the major sources which contain all the information regarding the users visited links, browsing patterns, time spent on a particular page or link and this information can be used in several applications like adaptive web sites, modified services, customer summary, pre-fetching, generate attractive web sites etc. There are varieties of problems related with the existing web usage mining approaches. Existing web usage mining algorithms suffer from difficulty of practical applicability. This paper continues the line of research on Web access log analysis is to analyze the patterns of web site usage and the features of users behavior. It is the fact that the normal Log data is very noisy and unclear and it is vital to preprocess the log data for efficient web usage mining process. Preprocessing is the process comprises of three phases which includes data cleaning, user identification, and pattern discovery and pattern analysis. Log data is characteristically noisy
منابع مشابه
ARPN Journal of Science and Technology::An Effective Web Usage Analysis using Fuzzy Clustering
Nowadays, internet is a useful source of information in everyone’s daily activity. Hence, this made a huge development of World Wide Web in its quantity of interchange and its size and difficulty of websites. Web Usage Mining (WUM) is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user’s visiting behaviors and obtains their intere...
متن کاملRepresenting a method to identify and contrast with the fraud which is created by robots for developing websites’ traffic ranking
With the expansion of the Internet and the Web, communication and information gathering between individual has distracted from its traditional form and into web sites. The World Wide Web also offers a great opportunity for businesses to improve their relationship with the client and expand their marketplace in online world. Businesses use a criterion called traffic ranking to determine their si...
متن کاملHybrid Model for Preprocessing and Clustering of Web Server Log
With increased rate in the usage of the World Wide Web (www) is growing both in its complexity and the volume of traffic of web site, it has become very important to analyze this web traffic and the usage of the web site by the users. Web usage mining is a main research area in web mining focused on learning about web users and their interaction with web sites. The information like server log, ...
متن کاملPreprocessing: A Prerequisite for Discovering Patterns in WUM Process
Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering r...
متن کاملPreprocessing: A Prerequisite for Discovering Patterns in Web Usage Mining Process
Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering r...
متن کامل